Analysis of transition cost and model parameters in speaker diarization for meetings

نویسندگان

چکیده

Abstract There has been little work in the literature on speaker diarization of meetings with multiple distance microphones since publications 2012 related to last National Institute Standards (NIST) Rich Transcription Evaluation Campaign 2009 (RT09). Lately, Second DIHARD Challenge also covered at dinner party that include distant microphones. Dinner are somehow harder than office because their participants can move freely around room. In this paper, we studied some algorithms for NIST 2007 (RT07) and RT09 provide definite clear improvements. On one hand, or no care taken problem penalizing favoring transitions between speakers other proposing a minimum duration turn calculating speakers’ probabilities using Variational Bayes (VB). We have issue determined transition penalty term is needed should be independent both number active turns. determination method automatically select right parameters crucial developing good models speakers. Previous studies proposed dynamic selection based speaker’s speech mixed performance when tested microphone meetings. propose new takes into account determine parameters, question overfitting maximum them, taking computation time order reduce it. carried out experiments support our findings, able improve baseline error rate distant-microphone Both methods achieve improved over baseline. The first obtains 21.6% decrease relative development set 4.6% test second 46.47% 17.54% set. complement each other, they applied combination, obtain 47.2% 22.02% obtained proposal outstanding subsets such as RT07 among best simple modifications. Furthermore, algorithm gains without jeopardizing performance. Results different publicly available database, augmented multiparty interaction (AMI) 28.44% confirming validity methods. Preliminary single stream (mfcc) endorse findings. Comparisons an x-vector system deliver superior unseen data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Speaker Diarization for meetings

This thesis shows research performed into the topic of speaker diarization for meeting rooms. It looks into the algorithms and the implementation of an offline speaker segmentation and clustering system for a meeting recording where usually more than one microphone is available. The main research and system implementation has been done while visiting the International Computes Science Institute...

متن کامل

Speaker Diarization in Meetings Domain

The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains such as telephone and broadcast news. Traditional techniques for speaker diarization developed for telephone conversations or broadcast news are based on a single channel, which is notably different f...

متن کامل

Improving speaker diarization for CHIL lecture meetings

Speaker diarization is often performed before automatic speech recognition (ASR) to label speaker segments. In this paper we present two simple schemes to improve the speaker diarization performance. The first is to iteratively refine GMM speaker models by frame level re-labeling and smoothing of the decision likelihood. The second is to use word level alignment information from the ASR process...

متن کامل

Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System

In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. The presented system is based on the RT05s system, which uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clu...

متن کامل

Multi-stage Speaker Diarization for Conference and Lecture Meetings

The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Eurasip Journal on Audio, Speech, and Music Processing

سال: 2021

ISSN: ['1687-4722', '1687-4714']

DOI: https://doi.org/10.1186/s13636-021-00196-6